Skip to content

Ensure swap_nonoverlapping is really always untyped #137412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 10, 2025

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented Feb 22, 2025

This replaces #134954, which was arguably overcomplicated.

Fixes #134713

Actually using the type passed to ptr::swap_nonoverlapping for anything other than its size + align turns out to not work, so this goes back to always erasing the types down to just bytes.

(Except in const, which keeps doing the same thing as before to preserve @RalfJung's fix from #134689)

Fixes #134946

I'd previously moved the swapping to use auto-vectorization on bytes, but someone pointed out on Discord that the tail loop handling from that left a whole bunch of byte-by-byte swapping around. This goes back to manual tail handling to avoid that, then still triggers auto-vectorization on pointer-width values. (So you'll see <4 x i64> on x86-64-v3 for example.)

@rustbot
Copy link
Collaborator

rustbot commented Feb 22, 2025

r? @ibraheemdev

rustbot has assigned @ibraheemdev.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 22, 2025
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Feb 22, 2025

The Miri subtree was changed

cc @rust-lang/miri

@ibraheemdev
Copy link
Member

Passing this one along because I'm not the best person to review this. r? libs

@rustbot rustbot assigned jhpratt and unassigned ibraheemdev Feb 26, 2025
@jhpratt
Copy link
Member

jhpratt commented Feb 26, 2025

I'm mostly sticking to the trivial PRs at the moment as my review capacity is limited. Re-rolling again.

r? libs

@rustbot rustbot assigned cuviper and unassigned jhpratt Feb 26, 2025
Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code changes LGTM for the non-const path; see the comment for the const path.

I did not look at the tests; I think that needs an LLVM/codegen expert. @nikic maybe?

@RalfJung
Copy link
Member

RalfJung commented Feb 27, 2025 via email

Comment on lines +69 to +79
// Ensure we do better than a long run of byte copies,
// see <https://github.com/rust-lang/rust/issues/134946>

// CHECK-NOT: movb
// CHECK-COUNT-8: movups{{.+}}xmm
// CHECK-NOT: movb
// CHECK-COUNT-4: movq
// CHECK-NOT: movb
// CHECK-COUNT-4: movl
// CHECK-NOT: movb
// CHECK: retq
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for reviewers: the codegen tests here are more about demonstrating what actually happens on a variety of types, and the exact details don't matter that much.

Reviewing the rust code is enough to know that LLVM will swap it, but for example here what we're trying to see is that it's not just a huge row of movbs like you can see in https://rust.godbolt.org/z/MKfxn1Tjr

@scottmcm scottmcm force-pushed the redo-swap branch 2 times, most recently from c1b9092 to 9d37b4b Compare March 1, 2025 05:57
@bors
Copy link
Collaborator

bors commented Mar 1, 2025

☔ The latest upstream changes (presumably #137848) made this pull request unmergeable. Please resolve the merge conflicts.

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Collaborator

bors commented Mar 7, 2025

☔ The latest upstream changes (presumably #138155) made this pull request unmergeable. Please resolve the merge conflicts.

@scottmcm
Copy link
Member Author

scottmcm commented Apr 9, 2025

@bors try

bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 9, 2025
Ensure `swap_nonoverlapping` is really always untyped

This replaces rust-lang#134954, which was arguably overcomplicated.

## Fixes rust-lang#134713

Actually using the type passed to `ptr::swap_nonoverlapping` for anything other than its size + align turns out to not work, so this goes back to always erasing the types down to just bytes.

(Except in `const`, which keeps doing the same thing as before to preserve `@RalfJung's` fix from rust-lang#134689)

## Fixes rust-lang#134946

I'd previously moved the swapping to use auto-vectorization *on bytes*, but someone pointed out on Discord that the tail loop handling from that left a whole bunch of byte-by-byte swapping around.  This goes back to manual tail handling to avoid that, then still triggers auto-vectorization on pointer-width values.  (So you'll see `<4 x i64>` on `x86-64-v3` for example.)

---

try-jobs: x86_64-gnu-distcheck
@bors
Copy link
Collaborator

bors commented Apr 9, 2025

⌛ Trying commit b06a88f with merge bef076d...

In `swap_nonoverlapping_short` there's a new `debug_assert!`, and if that's enabled then the `alloca`s don't optimize out.
@scottmcm
Copy link
Member Author

scottmcm commented Apr 9, 2025

Only change is skipping the test when the new debug_assert! isn't disabled.
@bors r=cuviper rollup=iffy (yay codegen tests -- ought to work now, but...)

@bors
Copy link
Collaborator

bors commented Apr 9, 2025

📌 Commit 63dcac8 has been approved by cuviper

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 9, 2025
Zalathar added a commit to Zalathar/rust that referenced this pull request Apr 10, 2025
Ensure `swap_nonoverlapping` is really always untyped

This replaces rust-lang#134954, which was arguably overcomplicated.

## Fixes rust-lang#134713

Actually using the type passed to `ptr::swap_nonoverlapping` for anything other than its size + align turns out to not work, so this goes back to always erasing the types down to just bytes.

(Except in `const`, which keeps doing the same thing as before to preserve `@RalfJung's` fix from rust-lang#134689)

## Fixes rust-lang#134946

I'd previously moved the swapping to use auto-vectorization *on bytes*, but someone pointed out on Discord that the tail loop handling from that left a whole bunch of byte-by-byte swapping around.  This goes back to manual tail handling to avoid that, then still triggers auto-vectorization on pointer-width values.  (So you'll see `<4 x i64>` on `x86-64-v3` for example.)
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 10, 2025
Rollup of 18 pull requests

Successful merges:

 - rust-lang#137412 (Ensure `swap_nonoverlapping` is really always untyped)
 - rust-lang#138167 (Small code improvement in rustdoc hidden stripper)
 - rust-lang#138605 (Clean up librustdoc::html::render to be better encapsulated)
 - rust-lang#138682 (Allow drivers to supply a list of extra symbols to intern)
 - rust-lang#138904 (Test linking and running `no_std` binaries)
 - rust-lang#139423 (Suppress missing field error when autoderef bottoms out in infer)
 - rust-lang#139449 (match ergonomics: replace `peel_off_references` with a recursive call)
 - rust-lang#139507 (compiletest: Trim whitespace from environment variable names)
 - rust-lang#139530 (Remove some dead or leftover code related to rustc-intrinsic abi removal)
 - rust-lang#139560 (fix title of offset_of_enum feature)
 - rust-lang#139563 (emit a better error message for using the macro incorrectly)
 - rust-lang#139568 (Don't use empty trait names)
 - rust-lang#139580 (Temporarily leave the review rotation)
 - rust-lang#139589 (saethlin is back from vacation)
 - rust-lang#139592 (rustdoc: Enable Markdown extensions when looking for doctests)
 - rust-lang#139599 (Tracking issue template: fine-grained information on style update status)
 - rust-lang#139600 (Update `compiler-builtins` to 0.1.153)
 - rust-lang#139606 (Update compiletest to Edition 2024)

r? `@ghost`
`@rustbot` modify labels: rollup
Zalathar added a commit to Zalathar/rust that referenced this pull request Apr 10, 2025
Ensure `swap_nonoverlapping` is really always untyped

This replaces rust-lang#134954, which was arguably overcomplicated.

## Fixes rust-lang#134713

Actually using the type passed to `ptr::swap_nonoverlapping` for anything other than its size + align turns out to not work, so this goes back to always erasing the types down to just bytes.

(Except in `const`, which keeps doing the same thing as before to preserve ``@RalfJung's`` fix from rust-lang#134689)

## Fixes rust-lang#134946

I'd previously moved the swapping to use auto-vectorization *on bytes*, but someone pointed out on Discord that the tail loop handling from that left a whole bunch of byte-by-byte swapping around.  This goes back to manual tail handling to avoid that, then still triggers auto-vectorization on pointer-width values.  (So you'll see `<4 x i64>` on `x86-64-v3` for example.)
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 10, 2025
Rollup of 17 pull requests

Successful merges:

 - rust-lang#137412 (Ensure `swap_nonoverlapping` is really always untyped)
 - rust-lang#138167 (Small code improvement in rustdoc hidden stripper)
 - rust-lang#138605 (Clean up librustdoc::html::render to be better encapsulated)
 - rust-lang#138682 (Allow drivers to supply a list of extra symbols to intern)
 - rust-lang#138904 (Test linking and running `no_std` binaries)
 - rust-lang#139423 (Suppress missing field error when autoderef bottoms out in infer)
 - rust-lang#139449 (match ergonomics: replace `peel_off_references` with a recursive call)
 - rust-lang#139507 (compiletest: Trim whitespace from environment variable names)
 - rust-lang#139530 (Remove some dead or leftover code related to rustc-intrinsic abi removal)
 - rust-lang#139560 (fix title of offset_of_enum feature)
 - rust-lang#139563 (emit a better error message for using the macro incorrectly)
 - rust-lang#139568 (Don't use empty trait names)
 - rust-lang#139580 (Temporarily leave the review rotation)
 - rust-lang#139589 (saethlin is back from vacation)
 - rust-lang#139592 (rustdoc: Enable Markdown extensions when looking for doctests)
 - rust-lang#139599 (Tracking issue template: fine-grained information on style update status)
 - rust-lang#139600 (Update `compiler-builtins` to 0.1.153)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors
Copy link
Collaborator

bors commented Apr 10, 2025

⌛ Testing commit 63dcac8 with merge 0fe8f34...

@bors
Copy link
Collaborator

bors commented Apr 10, 2025

☀️ Test successful - checks-actions
Approved by: cuviper
Pushing 0fe8f34 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 10, 2025
@bors bors merged commit 0fe8f34 into rust-lang:master Apr 10, 2025
7 checks passed
@rustbot rustbot added this to the 1.88.0 milestone Apr 10, 2025
Copy link

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 2205455 (parent) -> 0fe8f34 (this PR)

Test differences

Show 161 test diffs

Stage 0

  • ptr::test_ptr_swap_nonoverlapping_is_untyped: [missing] -> pass (J0)

Stage 1

  • [codegen] tests/codegen/swap-small-types.rs: pass -> ignore (ignored when std is built with debug assertions ((ptr::swap_nonoverlapping has one which blocks some optimizations))) (J2)
  • ptr::test_ptr_swap_nonoverlapping_is_untyped: [missing] -> pass (J3)

Stage 2

  • [codegen] tests/codegen/swap-small-types.rs: ignore (only executed when the architecture is x86_64) -> ignore (ignored when std is built with debug assertions ((ptr::swap_nonoverlapping has one which blocks some optimizations))) (J1)
  • [codegen] tests/codegen/swap-small-types.rs: pass -> ignore (ignored when std is built with debug assertions ((ptr::swap_nonoverlapping has one which blocks some optimizations))) (J4)

Additionally, 156 doctest diffs were found. These are ignored, as they are noisy.

Job group index

Job duration changes

  1. x86_64-apple-2: 7698.0s -> 5540.7s (-28.0%)
  2. dist-x86_64-apple: 8790.2s -> 10668.2s (21.4%)
  3. dist-aarch64-apple: 4710.8s -> 5210.3s (10.6%)
  4. dist-powerpc-linux: 5754.1s -> 5238.7s (-9.0%)
  5. x86_64-apple-1: 9359.0s -> 8539.6s (-8.8%)
  6. dist-aarch64-msvc: 8678.7s -> 8043.2s (-7.3%)
  7. x86_64-gnu-llvm-19-3: 7215.9s -> 6788.7s (-5.9%)
  8. dist-x86_64-illumos: 5769.6s -> 6101.2s (5.7%)
  9. dist-x86_64-freebsd: 5029.0s -> 4776.9s (-5.0%)
  10. dist-aarch64-linux: 5463.4s -> 5240.3s (-4.1%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (0fe8f34): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

  • If the regression was expected or you think it can be justified,
    please write a comment with sufficient written justification, and add
    @rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
  • If you think that you know of a way to resolve the regression, try to create
    a new PR with a fix for the regression.
  • If you do not understand the regression or you think that it is just noise,
    you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
    were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
0.7% [0.4%, 1.7%] 5
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.7% [-1.1%, -0.3%] 9
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.2% [-1.1%, 1.7%] 14

Max RSS (memory usage)

Results (primary 1.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.7% [2.6%, 5.9%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.5% [-2.6%, -2.4%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.2% [-2.6%, 5.9%] 5

Cycles

Results (primary 0.7%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.7% [0.7%, 0.7%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.7% [0.7%, 0.7%] 1

Binary size

Results (primary 0.0%, secondary -0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.3% [0.1%, 0.8%] 17
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.1% [-0.3%, -0.0%] 37
Improvements ✅
(secondary)
-0.1% [-0.1%, -0.1%] 3
All ❌✅ (primary) 0.0% [-0.3%, 0.8%] 54

Bootstrap: 783.243s -> 781.425s (-0.23%)
Artifact size: 366.21 MiB -> 365.92 MiB (-0.08%)

@rustbot rustbot added the perf-regression Performance regression. label Apr 11, 2025
@Mark-Simulacrum Mark-Simulacrum added the perf-regression-triaged The performance regression has been triaged. label Apr 14, 2025
@Mark-Simulacrum
Copy link
Member

Correctness fix, relatively small set of regressed scenarios, compared to number of improvements. Marking as triaged.

github-actions bot pushed a commit to model-checking/verify-rust-std that referenced this pull request Apr 19, 2025
Ensure `swap_nonoverlapping` is really always untyped

This replaces rust-lang#134954, which was arguably overcomplicated.

## Fixes rust-lang#134713

Actually using the type passed to `ptr::swap_nonoverlapping` for anything other than its size + align turns out to not work, so this goes back to always erasing the types down to just bytes.

(Except in `const`, which keeps doing the same thing as before to preserve `@RalfJung's` fix from rust-lang#134689)

## Fixes rust-lang#134946

I'd previously moved the swapping to use auto-vectorization *on bytes*, but someone pointed out on Discord that the tail loop handling from that left a whole bunch of byte-by-byte swapping around.  This goes back to manual tail handling to avoid that, then still triggers auto-vectorization on pointer-width values.  (So you'll see `<4 x i64>` on `x86-64-v3` for example.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet